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Abstract 

m 

I This paper shows that it is possible to improve the computational cost, 

the memory requirements and the accuracy of Quick Fourier Transform 

£SJ (QFT) algorithm for power-of-two FFT (Fast Fourier Transform) just 

introducing a slight modification in this algorithm. The new algorithm 
requires the same number of additions and multiplications of split-radix 

h— 5 3add/3mul, one of the most appreciated FFT algorithms appeared in the 

literature, but employing only half of the trigonometric constants. These 
results can elevate the QFT approach to the level of most used FFT 

i i procedures. A new quite general way to describe FFT algorithms, based 

on signal types and on a particular notation, is also proposed and used, 
highligting its advantages. 

O 
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^ 1 Introduction 

> 

The Fast Fourier Transform (FFT) is a basic subject in signal processing, and 
^ — , many FFT algorithms have been proposed in literature @] to compute it. An 

ideal FFT algorithm should have many desidered characteristics, according to 
the applicative context (the most important usually are: low computational 
cost, low memory requirements, high numerical accuracy, simplicity) but, up to 
C*~) date, no FFT algorithm is optimal in all these characteristics. For this reason 

new FFT algorithms with a different compromise between these desired charac- 
teristics are welcome also if they haven't got the best theoretical computational 
cost. For lenght N = 2 r the most popular algorithm is radix-2 [3], while a very 
appreciated algorithm is split-radix [5], [TT], [T3], of whom some interesting 
variants exist [2], [S]- However some algorithms more efficient than split-radix, 
if the used computational model evaluates efficiency with required flops (float- 
ing point operations), recently appeared: the scaled split-radix [8j (also called 
tangent FFT p]), the scaled odd-tail [TO] , and other ones are possible [7]. An- 
other good algorithm is the Quick Fourier Transform [5] (called 'classical QFT 
algorithm' in this paper), a real factor algorithm that uses few trigonometric 
constants, and has a good (not excellent) computational cost. In this paper we 
show how to improve the computational cost, the memory requirements (us- 
ing the same few trigonometric constants) , and numerical accuracy of classical 
QFT for N — 2 r , modifying this algorithm with the addition of a further but 
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appropriate intermediate decomposition. Characteristics of the new algorithm, 
called improved QFT, make it a good choise for fixed point implementation, 
where it is a good alternative to split-radix 3mul-3add. In order to point out 
the reason of this improvements, we introduce a new way, based on signal types 
and on a particular notation, to describe FFT algorithms. This new approach 
has many other advantages too (highligted in appendix), inherent many steps 
of 'algorithms life': the research and the theorical developing of algorithms, the 
exposition of algorithms, and the implementation of algorithms. Let us briefly 
summarize the content of the paper. In sect. 2 we show the new way to decribe 
FFT algorithms that we use in this paper. In sect. 3 the reader can find the 
used basic elaborations. In sect. 4 and sect. 5 we describe both classical and im- 
proved QFT algorithm respectively. Finally, in sect. 6 we discuss the memory 
requirement, the computational cost ad the accuracy relative to the proposed 
improved QFT algorithm. 

2 a new approach to describe FFT algorithms 

A new manner to describe FFT algorithms is used in this paper, and we could use 
it for many other power-of-two FFT algorithms (quite all the ones that use the 
'divide et impera' approach). This new approach is made of these components: 

• use of new concepts: signal types, non-zero value time indeces nzejn, 
stored stojn indeces, stored sto_k indeces, independent-value stored har- 
monics ind_sto_k, storage size In and Ik parameters of a signal in temporal 
and frequency domain respectively. Some of these concepts are useful since 
FFT algorithms described in this paper create many dcsccndcnt signals of 
whom we have to compute a pruned input and/or output transform. 

• use of a mnemonic notation to describe relevant characteristics of signals 
created inside the FFT algorithm. 

• use of a table (as TabJIJ to describe the characteristics of any signal type 
used in an algorithm. We should look at this table while we read this 
paper. 

• use of a table (as Tab|6]) that describes the matching between each signal 
and the array cells that store the signal, in an implementation of the 
algorithm in a suitable programming language (useful if we want to write 
the code of this FFT algorithm). 

• use of the decomposition tree (as Fig{2]), a graphical representation that 
shows both the concatenation of basic elaborations used by the functions, 
and the signal types handled by an FFT algorithm. 

2.1 Basic definitions 

• Signal (time) periodization N: the time period of the fundamental fre- 
quency of the transform applied to a signal. 
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DFT, DCT, DST transforms: three different ways to represent a sig- 
nal by superposing stationary-amplitude and stationary-frequency oscilla- 
tions. Analytically speaking we can define them as follows: 



AT-l 



S(k) = DFT[s](fc) = s ( n ) ' e~ W ' n ' k k E {0,1,2,..., (N- 1)} (1) 



n=0 



2 N 
S(k) = DCT[s](fc) = S H ' cos ( d - n ' k ) * e {0, 1, 2, (-)} (2) 



n=0 



-1 



TV 

=DST[s](jfe) = s(ra)-sin(0-n-fc) fc G {1, 2, 3, (— — 1)} (3) 

n=l 2 



where 9 is the angle pulse of fundamental frequency and defined as: 
N 



(4) 



and N is the periodization of the transform we apply to the signal. The 
DCT and DST transforms are defined in compliance with the definitions 
given in [Hj , with the only difference that here we describe them in terms of 
the periodization ./V (while N is the half-periodization in [6]). We can call 
them DCT -0 and DST -0, to distinguish them from other DCT and DST 
types already defined in literature (however they are similar to DCT —I 
and DST—/ respectively). Let us observe that for DFT the concept of 
periodization coincides with usual lenght term. Moreover we can apply 
DFT transform both to real (RDFT) and complex (CDFT) valued signals. 
With an abuse of notation, we will use the DFT, DCT, DST terms in case 
of pruned input and/or output too (when only a subset of N values s(n) 
are non-zero, or when only a subset of N values S(k) are required). 

• conversion: the elaboration resulting in the attainment of an only child 
signal from the mother one. An ideal conversion used inside a FFT al- 
gorithm doesn't increase the number of indeces n or k to handle, passing 
from mother to child signal, and requires a few flops. 

• Decomposition: the elaboration resulting in the attainment of two or more 
children signals from the mother one. An ideal decomposition used inside 
a FFT algorithm doesn't increase the total number of indeces n or k to 
handle, passing from mother to children signals (thus each child signal has 
both less n and less k indeces to handle versus its mother), and requires 
a few flops. 

• Forward phase of a conversion or of a decomposition: this is the phase 
where the time elements are processed. Therefore, when we apply it, the 
known elements are those of the mother signal, and the unknown elements 
those of the child signal (or signals). 
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• Backward phase of a conversion or of a decomposition: this is the phase 
where the frequency-domain elements are processed. Therefore, when we 
apply it, the known elements are those of the child signal (or signals), and 
the unknown elements those of the mother signal. 

• non-zero value time indices grouping nzejn(s) of a signal s: the group 
of n indeces where s(n) has not a-priori known zero value (nze = non- 
zero). This definition is useful since FFT algorithms described in this 
paper create many descendent signals that contain many a-priori known 
temporal zero-valued samples (pruned input) and thus of whom we com- 
pute a pruned transform. It is therefore evident that, in correspondence 
of related time instants, it is not required to apply any processing or to 
consume memory, improving the overall algorithm efficiency. For these 
reasons a processing that handles only residual time indeces can be used. 

• stored stojn{s) grouping of a signal s: the grouping of only indices n whose 
s(n) value we need (or it is convenient) to store in memory (and thus to 
compute too, if they are unknown). In general case, stojn and nzejn 
groupings can differ if a signal contains some dependent (as identical, 
or opposite) values sin), as it happens when we handle symmetric or 
antisimmctric signals (in fact, in this case, we can store only independent 
values, that means only a subset of nzejn). However, the stosi grouping 
coincides both with nzejn grouping and with the group of indipendent 
temporal elements, in any signal used in this paper. 

• stored sto-k(s) grouping of a signal s: the grouping of only indices k whose 
frequency-domain components we need (or it is convenient) to store in 
memory (and thus to compute too inside the FFT algorithm). In par- 
ticular, for any k external the stoJt grouping, one of this two conditions 
happens: 

— we are not interested in calculating the corresponding frequency- 
domain component (that can be different from zero), for example 
since it can be obtained from S(k) values of stored harmonics, with- 
out any further computation. 

— we are interested in calculating the corresponding frequency-domain 
component, but we obtain its value from the values of frequency- 
domain components in other stored harmonics of the same signal, 
without reserving array cells to store them. For example, if RDFT[s] 
computation is the goal, than k = N — 1 is not a required harmonic, 
since we can obtain its complex frequency-domain component S(k = 
N — 1) from S(k = 1) (in this case, storing the second half of RDFT 
frequency-domain signal too, is only for completeness in exposition 
of result, not a necessity of FFT algorithm). 

This definition is useful since FFT algorithms described in this paper 
create many pruned-output descendent signals. 

• indipendent-value stored harmonics grouping indsto-k(s) of a signal s: 
the subset of sto_k(s) group (created selecting k indeces starting from 
k = min(stO-k) and then increasing k) where any associated S(k) value 
has at least a real component that is independent from the other ones of 
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this group. This grouping coincides with sfo_fc grouping for any signal 
type used in this paper, except for three signal types, used in cassical 



QFT, as we will see in sect. 2.2.2 This concept is needed to catch the 



reasons of inefficiences of classical QFT versus improved QFT. 

• storage size ln(s) of a signal s in temporal domain: the number of array 
cells, with real values, used to store the temporal s(n) values, with n € 
stojn(s), of handled signal s. For any signal used in this paper In is 
univocally determined by the stojn(s) group, according to these relations: 



_ J card{sto-n(s)) for RDFT, DCT, DST 

ln{s) - | 2 • card{sto_n(s)) for CDFT ^ 

where card(A) is the cardinality of the set A. 

• storage size lk(s) of a signal s in frequency domain: the number of array 
cells, with real values, used to store the frequency-domain S(k) values, 
with k € sto-k(s) of handled signal s. For any signal used in this paper Ik is 
univocally determined by the sto-k(s) group, according to these relations: 



( card(sto-k(s)) for DCT, DST 

lk(s) = < 2 • card{sto.k{s)) for CDFT (6) 

[ 2 • [card(sto.k{s)) - 1] for RDFT 

• signal type: the configuration of characteristics of a signal that contains 
any information we need to know about a signal both to choise the the- 
oretical basic elaboration to apply to it, and to write the software code 
we apply to this signal inside the FFT algorithm (thus each signal type 
is handled always with the same basic elaboration, in a FFT algorithm). 
In this paper the applied transform, stojn, stoJt groupings are the only 
informations required to determine a signal type. Differently, the peri- 
odization TV of a signal doesn't contribute to determinate the 'signal type' 
(two signals can belong to the same 'signal type', even if they have dif- 
ferent periodization N), since the recursive function we apply to a signal 
doesn't depend on periodization N. Tab [I] lists any signal type used in 
this paper. 

• Decomposition Tree: a graphical representation of the FFT algorithm. 
This representation has the advantage that lets us to know both the struc- 
ture of the algorithm and any relevant characteristic of any descendent 
signal created by the concatenation of basic elaborations used to build the 
whole FFT algorithm, hiding mathematical details of these basic elabora- 
tions (that we can see in a separate section). The usage of word 'tree' is 
justified by the fact that quite all power-of-two FFT algorithms, including 
the innovative one that will be described in the following, basically convert 
and decompose the original (root) signal into different other descendent 
signals, thus giving origin to a decomposition tree (see for instance Figs. 
2g). 
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2.2 Signal types and notation 



2.2.1 Signal types 

In this paper we show that applying a small modification to the classical QFT, 
some of its characteristics improve. Catching the reasons of this fact is not 
simple, and requires to analyze the details of the computations, and of created 
signals, used step by step in the classical and in the new algorithm. In particular, 
we need to focus on stojn and stoJz groups of some gradually created signals. 
In fact, as we shall see, in the two algorithms shown in this paper, sometimes 
we use signals with identical characteristics. Other times we create some signals 
to whom we apply the same transform (i.e. DCT) and whose stojn and stoJk 
groupings differ only by the presence or not of an only stojn index, or of an 
only stoJi harmonic. We need to focus on these slight differences since they 
are relevant to make the new algorithm more efficient than the classical one. 
For this reason a new manner to describe FFT algorithms, that focus on the 



'signal type' concept is helpful. It particular, (as already announced in sect. 2.1 ) 
limitating the analysis to the two algorithms we are going to describe, we need 
to compare (and to focus on) three signal characteristics at the same: stojn, 
sto-k, and the applied transform, to determine the 'signal type' of a signal. The 
idea of formalize the concept of signal type, and to associate an unique signal 
name to it in a sistematic way, in description of the algorithms, shows many 
other advantages in development, exposition and implementation of algorithms. 
Details of these advantages are shown in appendix. Let us stress that signal 
types used inside an algorithm are not a-priori known, and can be determined 
only analyzing the mathematical details of any used basic elaborations, as we 
will see in sect. 03 



2.2.2 Notation for signal types 

Instead of associating a casual name to each signal type, we prefer to use a 
mnemonic descriptive notation, that identifies each signal type by means of a 
suitable sequence of symbols (see Tab[T]), according to these rules: 

• the main symbol is 's' (s=signal) in any case. 

• the first subscript symbol identifies the applied transform: 'ex' (complex), 
're' (real), 'dc' (DCT), 'da' (DST). 

• the second subscript symbol refers to stojn: 'o' means generic odd, 'e' or 
'ei' are two different grouping of only even n indices, 'i' or 'ti' (generic 
total) are two different grouping of both even and odd n indices. 

• the third subscript symbol refers to stoJc: 'o' (generic odd), 'e' (generic 
even), t t' or Hi' (generic total). 

This notation highlights the parallelism in the elaboration used in the corre- 
sponding recursive functions, in the DCT context, and in the DST context, 
inside classical, as improved, QFT algorithms. In this way, for many functions 
(except in the dctdo-cla or dstdo-cla functions used in the classical QFT), we 
can switch between signal types used in DCT context, to the ones used in DST 
context of the same algorithm, simply replacing the 'cfc' by the 'tis' subscript. 
As a side effect of this notation, there is no bijective correspondence among a 
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single subscript symbol, and a single feature of the signal (except fot the 1st 
subscript), but only among a sequence of subscript symbols, and a signal type. 
For example, the subscript 'e' referring to stoJt identifies: 

• the group stoJt = {0, 2, . . . , (y )} if it is used in Sdc.te sequence of symbols. 

• the group stoJk = {2,4, . . . , (y — 2)} if it is used in Sd s _te sequence of 
symbols. 

Notice that the exposed notation for signal types does not require to distinguish 
the 't' symbol (or any other symbol) depending on whether it refers to the 
grouping sfo_n, or it refers to the grouping stoJz (for example using the t n in 
the first case, and the symbol tk in the second case), because we only need to 
consider the position of the symbol in the notation to see if it relates to stojn or 
sto-k. This choice has the advantage to make the name of each signal shorter. 
Moreover this notation has the advantage that reading a signal type name we 
can immediately remember many characteristics of this signal. For instance, 
reading the term Sd s _t l0 , we remember that it denotes the signal type to whom 
we apply the DST, having both some even and odd residual time n indices, 
and for which only some odd k harmonics are required. Tab[l] reports all and 
only the sequences of symbols (signal types) used in this paper (it must be 
also noted that only some of the feasible symbol combinations describe signal 
types effectively occurring in the addressed algorithms). Let us observe that 
Sdc.tie, Sdc.tit, Sds_tio are the only used signal types with Ik = In + 1. It means 
they store a real S(k) element dependent from the other ones of the same stoJk 
group (and thus that indsto-k ^ sto.k holds), because a pruned input signal 
with only In real independent temporal elements, can have a maximum of In 
independent real transformed elements. 

These three signal types are used only in classical QFT, and some of them 



cause inefficiencies, as we will see in sect. |4.9|and 6.1 



2.2.3 notation for signals 

Each used signal is described by means of a notation that slightly modifies the 
notation used for the associated signal type, according to these rules: 

• the 1st symbol is 's' for temporal signals, and 'S" for frequency-domain 
signals. 

• an optional subscript identifier (numbers and/or capital letters), can be 
inserted after the 1st 's|5" symbol, to distinguish the handled signal from 
other signals of the same type used in the same context. 

For example SdcM and SA_dc_tt, S3.i_A_dc_tt are three different temporal signals 
of the same type 's dcM \ while S ds _ t and S A _ ds _ t, S4j_ A _ ds _ot are three different 
frequency-domain signals of the same type ! Sd s _ot'- 
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Table 1: Transform type, stojn, stoJt groups, storage size In and Ik parameters 
of a signal in temporal and frequency domain respectively, associated to any 
signal type used in this paper. 



gnal type 


transform type 


sto_n 


=nze_n 


stoJc 


In 


lk 


Scx_tt 


CDFT 


{0 12 


IN - 1 H 


{0, 1, 2, . . 


■ ,(N- 1)} 


2 ■ N 


2 ■ N 


Sre.tt 


RDFT 


In 1 9 


(N - 1 H 


10, 1, 2, 


■ ■ > (j)! 


N 


N 




DCT 


{0,1.2, 


-,(f)} 

...,d)} 


{0,1,2, 


■-(f)} 


N/2 + 1 


N/2 + 1 




DCT 


{0,2,4, 


{0,1,2, 


.-(4)} 


AT/4 + 1 


JV/4 + 1 


Sdc-ot 


DCT 


{1.3,5,.. 


•,(f -i)} 


{0,1,2,.. 


.,(f -i)} 


N/4 


N/4 




DCT 


{0,1,2, 


-,m 


{0,2,4. 


.-(f)} 


N/4 + 1 


JV/4 + 1 




DCT 


{0,1,2,.. 


•,(f -i)} 


{1,3,5,.. 


.,(f-i)} 


N/4 


N/4 




DCT 


{1,3,5,.. 


•,(f -i)} 

■.(4-m 
■,(| -i)} 
■,(| -i)} 

■,(4-2)} 


{0,2,4,.. 


-(f -2)} 


N/8 


N/8 


Sdc-oo 


DCT 


{1,3,5,.. 


{1,3,5,.. 


-(4-i)} 


N/8 


N/8 


^dc_t 1 e 


DCT 


{0,1,2,.. 


{0,2,4, 


.-(f)} 
.-(4)} 


N/4 


JV/4 + 1 


ScicJit 


DCT 


{0,1,2,.. 


{0,1,2, 


N/2 


JV/2 + 1 




DST 


{1.2,3,.. 


{1,2,3... 


-(f-i)} 
-(4-i)} 


N/2 - 1 


N/2 - 1 


Sds_et 


DST 


{2,4,6,.. 


{1,2,3,.. 


N/4-1 


N/4-1 


s ds_te 


DST 


{1,2,3,.. 


•,(4-i)} 


{2,4,6,.. 


■.(f-i)} 
-(4-i)} 


N/4-1 


N/4-1 


$ds_to 


DST 


{1,2,3, 


■■■,m 


{1,3,5,.. 


N/4 


N/4 


Sds-ot 


DST 


{1,3,5,.. 


•,(f -i)} 


{1,2,3. 


• -(f)} 


N/4 


N/4 


Sds_oe 


DST 


{1,3,5,.. 


•,(f -i)} 


{2,4,6, 


.-(f)} 


N/8 


N/8 


Sds-oo 


DST 


{1,3,5,.. 


•,(f -i)} 


{1,3,5,.. 


-(f -i)} 


N/8 


N/8 




DST 


{1,2,3,.. 


•,(f -i)} 


{1,3,5,.. 


-(f -i)} 


N/4-1 


N/4 




DST 


(f) 




i 


1 


1 



3 some basic elaborations (decompositions and 
conversions) shared by classical and improved 
QFT 

Algorithms developed in this paper involve some common decompositions and 
conversions, but applied to different signal types: separation of even harmonics 
from odd ones, separation of even time indices from odd ones, even harmonics 
halving, even time indices halving. It must be noted that, in the time-domain 
case study (forward phase) such decompositions or conversions take the time 
samples of mother-signal as known data, whereas those of derived signals as 
unknown. On the contrary, in the frequency-domain case study (backward 
phase) the relationship between known-unknown data and samples of mother- 
derived signals is inverted. Moreover we describe each elaboration not referring 
to a specific signal type, since each basic elaboration is applied to many different 
signal types in this paper. 



3.1 Separation between even and odd time indices 

Let St n be the generic mother signal and s 0n , s Cn the two created children signals. 
The temporal analytical equations corresponding to the separation between even 
and odd time indices are targeted to separate the only nzejn — stojn indices of 
mother signal types (to which this decomposition is applied): 

J s tn (n) evenn, n G nze_n(s t J , 
Se n {n) - I otherwise { ' > 
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/ \ f St (n) odd n, n £ nzejn(s t ) ,„ s 

«o„(«) = <^ n " ,1 ■ " (8) 

™ v ' [0 otherwise v ' 

Within the DCT it can be easily proved that this decomposition generates the 
following equations (backward phase): 

DCT[s t J(fc) = DCT[s e J(fc) + DCT[s J(fc) 

N (9) 
k £ stoJt(s tn ), k £ [0, — ) 



DGT[«t,](^ -k) = DCT[s e „](fc) - DCT[ So J(fc) 

N < 10 > 

k £ stoJs(s tn ),k £ [0, — ) 



DCT[s t J(fc) =DCT[s e „](/c) k £ sto.k{stJ n {j} (11) 

Similarly, within the DST case, the same eq.([9]),( 10 ),( 11 ) hold swapping s e „ 
and s 0n . Here we list the children signal types (described in Tabjl]) obtained 
applying this decomposition to different mother signal types. 

If s t „ = s dc _tt holds then s Qn = s dc ot and s e „ = s dc _et- 

If s tn = s ds _ tt holds then s Qn = s ds _ ot and e s Bn = s ds _ et . 

If s tn = s dc _ tlt holds then s 0n = s dc _ ot and s en = s dc _ et . 

Now we describe in details how to obtain the signal type of children signals, 
created by this basic elaboration, if the mother signal is St n — s dc _u, that 
has stojn(sd C tt) — nze-n(s dc _tt) = {0, 1, 2, ... , y}, according to Tabjl] Using 
eq.([7j),(|8| we obtain nze_n(s 6ii ) = {0, 2, . . . , y} and nzejn(s 0n ) = {1,3,...,^ — 
1}. Morever the mother signal has sto_k(s dc _u) = {0,1,2,...,^} according 
to Tabjl] Thus 6tq.d9b,{ 10 },( 11 ) force us to know (and to store) sfoJc(s e „) = 
{0, 1, . . . , and sto-k(s 0n ) = {0, 1, . . . , (^ — 1)}, computed by means of DCT 
in both cases. Combining these informations we obtain s Qn — s dc ot and s Gn — 
s dc _et, according to Tabjl] In a similar manner we can obtain the children signal 
types handled in the other cases of this basic elaboration, or in the other basic 
elaborations. 

3.2 Separation between even and odd harmonics 



This elaboration is dual to the one described in sect 3.1 Let s tfc be the generic 
mother signal and s Ck , s 0k the two created children output signals. Within the 
DCT context, it can be easily proved that separation between even and odd 
harmonics generates the following time-domain relations: 

s tk (n) + s tfc (f - n) n £ nzejn(s tk ), ne[0,|-l] 

,(n)H " {« = f > n nze.n{st J (12) 

otherwise 



(n) = { s *k( n ) -s **=(f _n ) n € nzeji(s tk ), n £ [0, f - 1] ^ 
1 otherwise 
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while, in the DST case, we obtain the same eq.(12),(13) swapping betweeen 
s ek and s 0k . The frequency-domain analytical equations corresponding to such 
a decomposition (backward phase) are targeted to align the stoJt indices of 
children signals s ek and s 0k : 

c (n \ _ / S ek {k) evenfc, k G stoJe(s tk ) , , 

0t " [n> ~ \ S 0k (k) odd k, k G sto.k{st k ) { ' 

Here we list the children signal types (described in Tabjl]) obtained applying 
this decomposition to different mother signal types. 

If s tk = s dc _ tt holds then s 0k = s dc .to and s Ck = s dc _ te . 

If s tk = Sdcuot holds then s Qk = s dc _ 00 and s ek = s dc _ oe . 

If s tk = s ds U holds then s 0k = s ds _ to and s ek = s ds _ te - 

If s tk = s ds _ ot holds then s 0k = s ds _ oa and s ek = s ds _ oe . 

3.3 Even Harmonics Halving 

The generic mother signal s ek , characterized by periodization N, is converted 
into the child signal St k , with time periodization Na = ) , whose harmonics 
we are interested to are both even and odd, and are obtained by halving each 
even harmonic of s efc , keeping unchanged their associated frequency-domain 
components. It can be easily proved that this corresponds to the following 
temporal relation: 

, , / s ek (n) n G nzejn(s ek ) n 
St k [n) = < n ,, . " (15) 

k [0 otherwise 

From a frequency-domain perspective (backward phase), dependening on which 
transform we are interested to, in DCT context, the following relation holds: 

BCT[s ek ]{k = 2-k A ) =DCT[s tk ]{k A ) k G sto_fe(s efc ) (16) 



while in DST context the same eq.(16) holds, changing DCT with DST. Here 
we list the children signal types (described in Tabjl]) obtained applying this 
decomposition to different mother signal types. 

If s ek = s dc _ te holds then s tk = s dc _ tt . 

If s ek = s dc _ oe holds then s tk = s dc _ ot . 

If s ek = s ds _ te holds then s tk = s ds _ tt - 

If s ek = s ds _ oe holds then s tk = s ds _ ot . 

If s ek = s dc _ tie holds then s tk = s dc _ tlt - 

3.4 Even Time Indices Halving 

This elaboration is dual to the one described in sect. |3.3| The generic mother 
signal s en , containing only some even time indices, is converted into the child 
signal st n , with periodization Na — (y), with both even and odd time indices. 
The signal st n is obtained by halving any even n index of s en , keeping unchanged 
their associated temporal values: 

, . / s en (n = 2-n A ) nenze_n(s e J , 
S *"M= otherwise (17) 
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From a frequency-domain perspective, in DCT context, the following relation 
holds: 



DCT[s e J(fc) = DCT[s t J(fc) k € sto_fc(s e J 



(18) 



while in DST context the same eq.(18| holds, changing DCT with DST. Here 
we list the children signal types (described in TabjlJ obtained applying this 
decomposition to different mother signal types. 

If s e „ = Sdc.et holds then s t = s dc _ tt . 

If s en = Sds.et holds then s t = s ds .tt- 



3.5 notes on the used basic elaborations 



We highlight that the eq.([7|,p|,([l2l),([l^,([l5|,([r7| prove that the basic elabo- 
rations used in this paper create descendent signals with many a-priori known 
s(n) = values, that thus don't require to be computed or stored. Moreover the 
eq.([9|),((l0|))(ll 1,(14 1,^16)1,(18 1 prove that we can compute S(k) values of mother 
signals of each basic elaboration, computing (and storing) frequency-domain 
values just in a subset (sfoJc) of harmonics of descendent signals, created by 
each basic elaborations. These observations legitimate the choise of creating 
new concepts, and a notation, that focus on stojn and sfoJc groupings of each 
created pruned input and/or output signal. 



4 The classical QFT algorithm 

The QFT algorithm [6 j (here denoted as classical QFT to distinguish it from 
the improved QFT algorithm) , is a real-factor algorithm which has encountered 
a significant success. We can describe it in terms of six functions calling each 
other, if it is finalized to the computation of the CDFT. Although it can be 
found in [6], here we present it using the new terms (signal types, notation, basic 
elaborations) developed in the previous sections. In this way the differences with 
the new algorithm (proposed in the next section), as the reason of inefficiency 
of classical version, will be clearly evident. 

4.1 The cdft_cla function 

The input signal is of type s cxt t- Let N be its length (equal to periodization). 
If N = 2 then the CDFT definition is directly applied, otherwise the CDFT 
calculation is decomposed into two RDFT tranforms, relative to children output 
signals s\_ re _u and S2_re_tt, both of length N (equal to periodization) that we 
manage by means of rdft-da function. The two children signals are created 
according to the following time domain relations: 

si_re.tt{n) = n[s cx _ tt (n)} n G {0, 1, 2, . . . , {N - 1)} 
S2.re.tt(n) = %[s cx _ tt (n)} n €E {0, 1,2,..., (N — 1)} 

We can prove that above time domain relations correspond to the following 
frequency-domain relationships (backward phase): 

sR{CDFT[s cx _ tt ]}(k) = 5R{RDFT[ Sl _ re _ tf ]}(fc) - 3{RDFT[ S2 _ re _ tt ]}(fc) 
fee{i,2,...,(~-l)} 



11 



di{CT)FT[ Scx _ tt }}{N -k) = dt{RDFT[ Sl _ re _ tt }}(k) + 3{RDFT[s 2jeJt ]}(fc) 

N 

fee{i,2,...,( ¥ -i)} 

N 

$t{CDFT[s cx _ tt ]}{k) = SR{RDPT[s ljeJ f]}(fc) * € {0, (y )} 
3{CDFT[s c:c _ ft ]}(fc) = 3{RDFT[ Sl _ re _ tt ]}(fc) + 3?{RDFT[ S2 _ re _ tt ]}(fc) 
fce{l,2,...,(y-i)} 
3{CDFT[s c:c _ tt ]}(iV- fc) = -9f{RDFT[«i JreJt ]}(fc) + K{RDFT[s 2 _ re .«]}(fe) 

fcG{l,2,...,( y -l)} 

3{CDFT[s C2; _ tt ]}(fc) = 3?{RDFT[ S2 _ re _ tt ]}(fc) fc e {0, (y )} 
4.2 The rdft-da function in classical QFT 

The input signal is of type s re _u . Let N be its length (equal to periodization) . If 
N = 2 then we apply the RDFT definition, otherwise we decompose the RDFT 
calculation into the calculation of a DCT (applied to the child signal Sd c _tt 
of periodization N, that we manage through a dct.cla function) and a DST 
(applied to the child signal Sd s _tt of periodization N , that we manage through 
a dstjzla function). We can prove that the two output time domain children 
signals are created by means of these equations: 

" s re _ tt (n) + s re _ tt {N - n) n € {1, 2, 3, . . . , (f - 1)} 
Sdc_tt{n) = { s re _ tt (n) n € {0, (%)} 

otherwise 



s re .tt(n) - s re _ tt (N - n) n e {1, 2, 3, . . . , (f - 1)} 
otherwise 



corresponding to the following frequency-domain relationships (backward phase) : 



N 

3fJ{RDFT[ Sre _ ft ]}(fc) = DCT[s dc _ tt }(k) k e {0, 1, 2, . . . , (-)} 



9{RDPT[« reJt ]}(fc) = - DST[s dsJt p) k e {1, 2, 3, . . . , (^ - 1)} 



4.3 The dct_cla function 



2 

2 



The input signal is of type SdcM- Let iV be its periodization. If N = 2 then we 
directy apply DCT definition, otherwise the mother signal Sd c _tt is first decom- 



posed separating the even and odd harmonics (as shown in sect. 3.2 1, creating 
the two children signals Sd c _te and Sdc.to, both with periodization equal to N. 
Afterwards, the signal Sdc.te is converted into the signal SA_dc_tt, having peri- 
odization Na = (y), by halving each even k index (as shown in sec. 3.3 



Therefore we obtain the two output signals s a _dc _tt an d Sdc_to, that we hand 
by the dct_cla and dct_to-da functions respectively. 
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4.4 The dst_cla function 



This function operates similarly to the dct one. Moreover the input and output 
signals have the same notation as in the dct case, a part from substituting dc 
with ds (this is possible thanks to our notation, which highlights the analogy 
between DCT and DST management of signals). 



4.5 The dct_to_cla function 

The input signal is of type Sdc.to- Let's call N its periodization. If N = 4 then 
we appy the DCT definition, otherwise we apply the following operations. First, 
we transform the sequence of odd harmonics of mother signal Sd c _toi into the 
sequence of even harmonics of a new signal, of type Sg c tie , by multiplying the 
signal Sdc.to with the secant function in the time domain: 

1 N 

SdcMe{n) = s dc _ to (n) ■ T r n € {0,1,2,..., (— - 1)} (19) 

2 • cos{0 • n) 4 



In eq.(19) a special case holds for n = 0, since the multiplication can be substi- 
tuted by a binary translation. It can be proved that eq.( |19[ ) corresponds to the 
following frequency-domain relationship (backward phase): 

BCT[ Sdc _to](k) = DCT[ Sdc _ tie ](fc - 1) + DCT[s dcJie ](fc + 1) 

TV (20) 

fce{l,3,5,...,(--l)} 

Afterwards, we halve the even k indices of signal Srf c _t ie transforming it int o 



the signal SA_dct 1 t 1 which have periodization Na = ( as shown in sect. 3.3 
We then process this signal similarly to Sdc.tt in the dct function (sect. 4.3 
As a result we have two output signals Sbj, c m, (with periodization Nb = ( x)) 
and SA_ dc j,o (with periodization Na — that we handle by the dctjda and 

dctdo-cla functions respectively. 



4.6 The dstjto-da function 

This function is different from the dct -to one in many aspects since it applies 
to the input signal Sd s _ to . We first separate the nze-n(sd s _to) indices into two 
children signals: s ds _t 1 o an d sg s ei0 (s ds ei0 ereditates, and has, only the residual 
time index n = (^)): 

{s ds _to{n) 
SdsMo{n) = { 



ne {1,2,3,. 
n = f 
ne{0,(|4 



(f -1)} 



(21) 



l),(f +2),...,(JV-1)} 



Sds_eio(Tl) 



s ds .to(n) 




ne{0,l,...,(f -1)} 



(22) 



We can prove that eq.(|21|), (|22|) correspond to the following frequency-domain 

-l)^ 1 -DST[ Sds _ ei0 ](fc = l) 



relationship (backward phase): 



DST[s ds _ to ](fc) =DST[s ds _ tl0 ](fc)- 
k e sto-k(sds.to) 



(23) 
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where: 



N 

DST[s ds _ ei0 ](fc = 1) = s ds _ ei0 (n = — ) 



(24) 



Then we transform Sd s _t l0 into Sd s _te, as done in ( 19 1,(20 ) (only involved n_indeces 
and signal types change), and then we transform Sd s _te into SA_ds.tt by halving 
each k index (as described in sect. 3.3 1. Differently from dctdo_cla function, 
we don't prosecute applying to SA_ds_tt the separation between even and odd 
harmonics (described in sect. 3.2), since we have already created two output 
signals in this function. Thus the two output signals are: sajLs jtt (handled by 
dst_cla function) and s<j s eio that is a leaf of decomposition tree (see Figj2]) and, 
for this reason, it is not handled by other functions. 

Let us observe that the creation of Sd s _ ei o signal means to compute separately 
(using the definition of DST, a very inefficient tecnique) the contribution of 
Sds_to{n — x) temporal element to frequency- domain components of Sd s _to, since 



we cannot apply (19) to (n = x) case too (to avoid division by zero). 



4.7 The classical QFT: the decomposition tree 

Classical QFT recursive algorithm can be diagrammatically represented by the 
decomposition tree reported in Fig(2] Such a tree refers to calculating the entire 
RDFT of a s re _tt root signal, and reports four levels of decomposition (the total 
number of decomposition levels depends on the signal periodization N) . In order 
to facilitate the identification of different signals within the decomposition tree, 
which could be difficult due to the occurrence of multiple usage of the same 
functions and of the same signal types many times, each signal has been re- 
named (with respect to the names used in this section), according both to sect. 
|2.2.3| and to the following criterion: a rixja^ identifier is placed in front of 
the subtitles symbols list, where n\ denotes the decomposition level, and rii 
the signal position within the decomposition level. It follows that the starting 
signal, which is processed by the function rdft, changes its name from s re _tt 
to s\ t i_ re _u. Moreover the two output signals created by this function become 
S2,i^dcM and S2,2jisM- They are respectively processed by functions dct and 
dst. The dct function (whose intermediate signals created inside it are also 
reported in Figj2]) generates the two output signals SA_dc.tt (s3,iA-dc-tt) an d 
Sdc.to (s3,2_dc_to), which are the input signals of functions dct-da and dctJbojcla 
respectively. The remaining part of the graph can be explained in a similar way. 



4.8 The classical QFT: computational cost 

For CDFT calculation, the classical QFT presents the following computational 
cost [6]: 

mul(iV) = N ■ log(iV) - ~ • N + 2 

add(iV) = --N- log(iV) - 4 • N 

9 27 
flop(AT) = --N- log(iV) - — • N + 2 
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4.9 The classical QFT: memory requirements 

The classical QFT algorithm requires the employment of (^ — 1) distinct trigono- 
metric (secant) constants: csc(6> • n) for n e {0,L...,(| - 1)}. Moreover 
the classical QFT can be implemented in-place only if our goal is the DST —0 
computation (an implementation is in-place only if the memory size needed to 
perform the algorithm operations, in addition to the memory area containing 
the start-signal, is fixed, that means not dependent on the periodization N of 
handled root signal). In fact each function used in DST context (intuitively 
speacking) has these good characteristics: 

• it doesn't increase total In and total Ik to handle, passing from the input 
mother signal to the two output descendent signals. 

• it can be implememnted using a fixed number of inner temporary variables, 
that doesn't depend on the periodization N of handled input signal. 

• it uses basic elaborations whose conversions or combinations of temporal 
(or frequency-domain) elements can intrinsically be implemented in-place 
(not depending on algorithm where they are inserted). 

Differently the computation of DCT —0 or DFT, can not be implemented in- 
place, using the classical QFT, because of dcHo-da function, that increases 
both the total stojn indices and total sto^k harmonics that we have to handle 
(and to store in memory), passing from the input signal to the two output 
descendent signals, each time this function is used. Here is the proof of this 
statement. From Tabjl] considering the specific periodization of each signal, the 
following relations hold (from an input-output point of view) , for the dct-to-cla 
function: 

N N 
ln(s dc .to) = lk(s dc _to) = ~j 

I ( , N A N N A N 

ln(SA_dc_to) = = -g- ik{s A _dcj.o) = = -g- 

N B N N B N 

ln(s B _dc.tt) = — + 1 = — + 1 lk(s B _dc_tt) = — + 1 = — + 1 
Z o z o 

Combining the previous values we obtain the thesis in the dctdo-cla case: 

In(sdcto) = ln(s A _ dc _to) + ln( s B.dc.tt) + 1 (25) 
lk(s dc _to) = lk(s A _dc_to) + lk(s B _ dc _tt) + 1 (26) 

This implies an increment of total elements to manage as progressing the tree 
decomposition, and therefore high memory requirements are needed (if N is 
high), preventing an 'in place' implementation too. The shown formal proof has 
a disadvantage: it doesn't explain the mechanism inside the dctJtojda function 



that creates eq.( 25 ),fl26|). Here is the explanation of this mechanism. Total Ik 



increases because of (I2CM) that forces us to know (and to store) 4- harmonics of 



child signal SdcJie, to compute only ^ — 1 harmonics of mother signal Sdc_to, in 



backward phase. On the contrary eq.(19) doesn't increase In. For this reason 



(19 1,(20) create a child signal (sdc_tie) with Ik > In (and thus with indstoJk ^ 
stoJk too). Handling a signal (sd c _tie) with Ik — In + 1 is inefficient because it 
means we have to compute and to store in memory an harmonic k whose S(k) 
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value is linearly dependent from all other ones of the signal, since a pruned input 
signal with only In real independent temporal elements can have a maximum 
of In independent real transformed elements (thus, intuitively speaking, this 
adding harmonic doesn't increase the information stored in other harmonics). 
The increase of total In inside dctJtojzla function is caused by the separation 
between even and odd harmonics of Sdcj.it signal, used after (19 1,(20)). This 
theoretical elaboration increases total In because we apply it to an atipical signal 
(sdcj.it) that has In ^ Ik (and thus with indstoJk ^ stoJt too). In fact this 
problematic is not intrinsic of the decomposition described in sect. |3.2| since, 
applying this decomposition to other signal types, total In and Ik parameter are 
kept inalterated. 



5 The improved QFT algorithm: basic idea and 
description 

5.1 The idea of improved QFT algorithm 

In classical QFT algorithm, two are the main factors that make the computa- 
tional cost high: 

• the separate handling of time domain element Sd s _to(n = x)> within the 
DST calculation (we compute its contribution to DST frequency-domain 
components by means of DST definition to avoid the division by zero 



required if wc apply (19) to s ds _ to (n = x )) 



the increasing global number of elements to be managed, both temporal 
and frequency-domain, in DCT context, as signal decomposition proceeds 
further, because of dctJtojzla function, as shown in sect. |4.9| 

The idea to improve this algorithm consists in applying both the separection 
between even and odd n-indices and the separation between even and odd har- 
monics before to transform odd in even k-indices. In this way we avoid the 
growth of both temporal and frequeny-domain elements to be managed as the 



decomposition level raises (in DCT context), as we will prove in sect. 6.1 More- 



over, in DST context, the odd in even harmonics conversion will affect signals 
with only odd n indices, avoiding to handle the problematic even index n = ^r, 
and thus avoiding to compute its contribution to frequency-domain components 
using the definition of DST (a very inefficient tecnique used in dst-to function 
in classical QFT). 

Let us stress that the separation betweeen even and odd time indeces can 
be performed both before and after the even/odd harmonics separation. We 
prefer to apply the temporal separation first, and then the frequency-domain 
one, in order to minimize the number of distinct recursive functions to involve. 
According to these modifications in the new QFT algorithm, the odd in even 
conversion is applied only to Sd c _ 00 an d Sd s _oo signal types. 

5.2 Recursive description of improved QFT algorithm 

Improved QFT algorithm can be described in terms of 8 functions [cdft, rdft, 
dct, dst, dct-Ot, dst-ot, dct_oo, dst-oo) calling each other, if it is finalized to the 
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computation of CDFT. The rdft and cdft functions coincide with rdftjda and 
cdft_cla functions respectively used in classical QFT algorithm. 



5.2.1 The dot function 

The input signal is of type Sd c j.t, with periodization N . If N = 2 we just apply 
the DCT definition, otherwise we first separate the even temporal indices from 



the odd ones (as shown in sect. 3.1|, generating the signals Sds.et and Sdc_ot, 
characterized by periodization N. Then we convert the signal Sds.et into the 
signal SA_dctt, with periodization Na — (y)j by halving each even temporal 
index n (as shown in sect. |3.4[ ). At the end we have the two output signals 
SA.dc.tt and Sdc.ot handled by, respectively, the dot and the dct_ot function. Let 
us observe that this function (as dst function) contain the new decomposition 
(the separation between dd and even time indeces) introduced in improved QFT 
algorithm to avoid the problematics of classical QFT. 

5.2.2 The dct _ot function 

This function operates as the dcljda function in classical QFT, but it applies to 
Sdc.ot signal type (with periodization N), instead of Sdc.tt signal type. If N = 4 
we just apply the DCT definition, otherwise we first separate even from odd 



harmonics (as shown in sect. 3.2), generating the signals Sdc.oe e Sdc.oo, both 



characterized by periodization N. Then the signal Sd CJoe is converted into the 
signal SA_dc.oti having periodization Na = (y), by halving each even index k 



(as shown in sect. 3.3). We therefore obtain two output signals SA_dc_ot and 



Sdc.oo handled by, respectively, the dct.ot and dct_oo functions. 
5.2.3 The dct_oo function 

This function operates as the dctdo-cla function in classical QFT, but it applies 
to Sdc.oo signal type, instead of Sd c _to signal type, being this slight difference 
relevant to improve the DCT computational cost. The input signal is of type 
Sdc.oo, with periodization N. If N — 8 we just apply the DCT definition, 
otherwise we first transform the even harmonics into the odd ones, generating 
the child signal Sdc_oe, with periodization N, by means of these equations: 

1 N 

Sdc.oc{n) = s d c_oo(n) ■ r r fie {0,1,2,..., (— - 1)} (27) 

I ■ cos(6' • n) 4 



N N 
DCT[s dc _ 00 ](fc = — -!) = DCT[ Sdc _ oe ](fc = -) (28) 



DCT[ Sdc _ OD ](fc) = DCT[s dc _ oe ](fc - 1) + DCT[s dc _ oe ](fc + 1) 

N (29) 

fce{l,3,5,...,(--l)} 



These relations are similar to eq.(19),(20), the only difference being involved 



signal types. Then we halve the ev6n k indices of Sd c oe (as shown in sect. |3.3| , 

JV 
2 



generating the signal SA_dc_ot, having periodization Na = (y)- From now on 
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we handle the signal SA_dc_ot as well as we handle the signal Sdc.ot in function 
dct_ot of this algorithm, and therefore we create two output signals SB_dc_ot 
(with periodization Nb = (x)) handled by function dct-Ot) and SA.dc.oo (with 
periodization Na = (^)j handled by function dct_oo). 

5.2.4 The ds£, dst_ot, dst_oo functions 

The functions dst, dst-Ot, dst_oo apply the same chain of elaborations used in 
dct, dct-Ot, dct-00 functions respectively, but applied to different input signal 
types (sdsM, Sds_ot, s ds _oo instead of s dc tt , s dc _ot, s dc _oo respectively). Thus the 
signal types used in this dst family of functions have the same notation of the 
corresponding ones used in dct family, the only difference being that each dc 
occurance is replaced by the ds one. Moreover, two other relevant differences 
occur: 

• in dst function the DST definition is applied for N = 4 instead of N = 2. 

• in dst_oo function, a special case holds for k = 1, instead of k — (^ — 1), 



in eq.(28> 



Improved QFT recursive algorithm can be diagrammatically represented by the 
decomposition tree reported in FigjSJ 



6 characteristics of the improved QFT algorithm 

In this section we compare the characteristics of improved QFT with the ones 
of classical QFT and of split-radix 3mul-3add 



6.1 Memory Requirements 



Improved QFT algorithm requires few trigonometric constants: quite the 
same used in classical QFT. 

In fact it employs (^) distinct trigonometric constants: the same (=y — 
1) secant constants used in classical QFT algorithm, more the cos(^) 
constant used in dct-oo and dstjjo function if N = 8 (the special case 
where the definitions of DCT and DST respectively are applied). It is 
a good characteristic since, for example, conjugate-pair split-radix 3add- 
3mul requires twice real trigonometric constants. 

Improved QFT algorithm requires less memory cells than classical QFT, 
if the goal is the DCT or DFT computation. 

The reason is that, in any recursive function (or in any level of decompo- 
sition), of the new QFT algorithm, the total In or Ik parameters do not 
increase, passing from the input to the descendent output signals (differ- 
ently these parameters increase in DCT context, in classical QFT, as seen 
in sect. 4.9). 

We prove this statement here only for the dct-oo function, since it plays 
the same role that the 'ill' dctJtojda function (the one that increases total 
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In and Ik) has in classical algorithm (since both functions contain the con- 
version of odd harmonics signal into an even harmonics signal, multiplying 
the mother signal by secant function). 

From TabJT] considering the specific periodization of each created signal, 
the following relations hold (from an input-output point of view) for dct-oo 
function: 

N N 

l n {Sdc_oo) = lk{sd c _ 00 ) = — 

I < ,N A _N N A _N 

ln{SA_dc_oo) — ~ T7T A_dc_oo ) — ~T~ — "777 

8 16 8 16 

II , N B N N B N 
ln(s B _dc_ot) = = lk{s B _dc_ot) = = 

Combining the previous values we obtain the thesis (for dct-oo case): 

ln{s d c.oo) = ln(s A _dc.ot) + ln(s B _ dc _ot) (30) 
lk(s dc _oo) = ik(s A _dc.ot) + lk(s B _ dc .ot) (31) 

Similar relations hold for any other function used in improved QFT al- 



gorithm. Comparing (30 1, (31) with (25 1,(26), it results that the new al- 



gorithm requires less memory locations than classical QFT, in DCT, and 
thus DFT too, context. 



Let us analize the mechanism that avoid us to obtain eq.(25),(26) in im- 
proved QFT. In classical and in improved QFT we convert odd harmonics 
signal into an even harmonics signal, in the same manner: multiplying 
temporal signal by secant function (only involved signal types change). 
However, applying this theoretical elaboration to Sd c _oo we obtain a child 
signal (sdc.oe) with In = Ik. Differently in classical QFT applying the 
same conversion to Sdc.to signal we create the Sdc.tie signal type, that has 
Ik > In. Moreover, in improved QFT, applying the separation between 



even and odd harmonics (described in sect. 3.2 ) to Sdc_ot (instead of Sd c j>xt 
used in classical QFT) total In doesn't increase. For these reasons all signal 
types created in improved algorithm have In = Ik (and indstoJt = sto-k 



too), and thus we remove the inefficiences described in sect. 4.9 We 
highlight once again that the shown slight differences of improved QFT 
versus classical QFT are difficult to cath using traditional exposition, but 
are easy to catch using new exposition approach here used, that focus on 
stojn and stoJk groupings of each created signal. 

The new QFT algorithm is eligible for an 'in place' implementation (but 
to find an efficient code that implements it requires further work) , because 
each used function has the characteristics already described in sect. |4.9| 
for the functions used in DST context, in classical QFT. 



6.2 Computational Cost 

The mathematical expressions of detailed computational cost of the new algo- 
rithm are reported in Tab(2] Tab(3| gj [5] compare the computational cost of 
improved QFT with those of classical QFT and of split-radix 3add-3mul [T^] in 
the CDFT case. It results that the improved QFT: 
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• requires less additions, multiplications, and flops (and, for this reason, is 
more efficient, in this model of computation), than the classical QFT, for 
N > 16. 

• requires the same additions, multiplications and flops of split-radix 3add/3mul 
for any N = 2 r , but using half trigonometric constants, as seen before and 
according to [5].[T2"]. 

It can be shown that improvements are more consistent in the DST than 



in the DCT case, since, in the former case, we avoid to use eq. (23 1. Intu 



itively speaking, the improved QFT algorithm has a lower computational 



cost than classical QFT for the same reasons shown in sect. 6.1 for mem- 
ory requirements, since computational cost is linearly related to In and 
Ik parameters of handled signals. In fact more indeces we handle, more 
arithmetic intructions we compute (to store indeces that will not be used 
again, has no sense). 

6.3 numerical accuracy 

We have tested and compared (see Figjl]) the accuracy of improved QFT, clas- 
sical QFT and split-radix 3add-3mul, using Scilab 5.3.3 and 64 bit double preci- 
sion data types, on a Pentium IV with MS Windows XP. The accuracy of each 
algorithm has been quantified by means of relative rms error (according to [5]) 
and testing the algorithms on many (10 2 — 10 3 depending on N) h random (with 
-0.5 < |5ft[sj(ra)]| < 0.5, -0.5 < |Sy[s,(n)]| < 0.5 Vn) complex-value signals 



1 v-^ \\Si — S -exact . . 
relative jrms -err or = — > — r—r n — (32) 



In (32 1 the euclidean norm is used, S -exact J = exact-CDFT[si] is computed 
using quadruple precision, Si = approx-CDFT[si] is the estimated output sig- 
nal obtained using the handled FFT algorithm and double precision compu- 
tation. We highlight that, in Si = approx-CDFT[si] computation, any used 
double-precision trigonometric constant has been pre-computed passing through 
a quadruple-precision value, and then rounding it in double-precision. In this 
manner both the used trigonometric constants array and the tested FFT algo- 
rithm have the best accuracy with respect the limit of 64 bit storage (this opti- 
mal accuracy can not be directly reached using the sine and cosine 64-bit default 
functions). It is an interesting aspect since, if we would compute the required 
trigonometric constants remaining in double precision (without passing through 
quadruple precision), then the numerical error of classical and improved algo- 
rithms will grow about 22% and 25% respectively for N — 2 16 . Figfljshows that 
improved QFT enhances the accuracy of classical QFT about 4-9% (depending 
on N) . Moreover for small N the accuracies of improved QFT and of split-radix 
3add-3mul are quite the same. Unfortunately both classical and improved QFT 
have a numerical error that grows faster than that one of split-radix 3mul-3add. 
However we think that it often is not a so relevant disadvantage. In fact, in 
many applications, we are interested only on the first three decimal digits of the 
output values S(k) and, in such a context, to have a relative jrms-error equal 
to 10~ 14 or 10~ 16 is quite the same. Thus we can use the improved QFT also 
if N is high in such applications. 
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Figure 1: Comparison in numerical accuracy of improved QFT (ncw.QFT), 
classical QFT (cla_QFT) and split-radix (SR) 3mul-3add 



Table 2: Computational cost required for various sinusoidal transforms by means 
of the improved QFT, in dependence on their pcriodization N. 



transform 


multiplic 


itions 


sums 


flop 




CDFT 
RDFT 
DCT-0 
DST-0 


iVlog(JV) - 
fiVlog(AT)- 
|iVlog(JV)- 
|iVlog(JV)- 


3JV + 4 

- |JV + 2 

- fiV + 1 

- In + i 


3iVlog(7V) - 3iV + 4 
fiVlog(jV) - §JV + 4 
|iVlog(7V)-|iV + log(JV) + 3 
|iVlog(7V)-|7V-log(JV) + 3 


4jVlog(7V) - 
2iVlog(7V) - 
JVlog(JV)- |JV- 
TVlog(JV) - \N 


6N + S 

4iV + 6 
f log(iV) + 4 
- log(JV) + 4 



7 Conclusions 

The addition of an appropriate intermediate decomposition in the classical QFT 
algorithm produce a more efficient QFT algorithm, with the same computa- 
tional cost of celebrated split-radix 3add/3mul, but keeping the lower number 
of trigonometric costants of classical QFT versus split-radix 3add/3mul. These 
characteristics make the improved QFT algorithm a good choise for CDFT, 
RDFT, DCT, DST computation, in fixed point implementation, where a mul- 
tiplication is slower than an addition, or in a parallel pipeline hardware imple- 
mentation. Moreover, an efficient 'in place' implementation of improved QFT 
can be object of future research, since it is possible, but it is is not available 
yet. 
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Table 3: Comparative evaluation of number of sums required for CDFT calcu- 
lation for split-radix 3add/3mul algorithm (SR_3/3), classical QFT algorithm 
(clas.QFT), improved QFT algorithm (new.QFT) 



N 


SR2/3 


clas.QFT 


new -QFT 


4 


1G 


16 


16 


8 


52 


52 


52 


16 


148 


160 


148 


32 


388 


432 


388 


64 


964 


1088 


964 


128 


2308 


2624 


2308 


256 


5380 


6144 


5380 


512 


12292 


14080 


12290 


1024 


27652 


31744 


27652 


2048 


61444 


70656 


61444 



Table 4: Comparative evaluation of number of multiplications required for 
CDFT calculation for split-radix 3add/3mul algorithm (SR_3/3), classical QFT 
algorithm (clas_QFT), improved QFT algorithm (new_QFT) 



N 


dasJQFT 


newjQFT 


SR3/3 


4 











8 


4 


4 


4 


16 


22 


20 


20 


32 


74 


68 


68 


64 


210 


196 


196 


128 


546 


516 


516 


256 


1346 


1284 


1284 


512 


3202 


3076 


3076 


1024 


7426 


7172 


7172 


2048 


16898 


16388 


16388 



Appendix: Advantages of signals types and nota- 
tion 

As mentioned in sect. |2.2[ the use of the 'signal types' has numerous advan- 
tages. In certain aspects these advantages are also reinforced by the particular 
notation used to indicate the various types of signals created. These advantages 
are inherent three aspects: implementation of the theoretical algorithm in a 
programming language, ideation and development of a recursive version of the 
algorithm, theoretical description of the algorithm. Differently this approach 
has just a few disadvantages: 

• the reader have to spend time in comprehension of this atipical approach 
to describe FFT algorithms. 

• signal names are long (this disadvantage is specific to the particular nota- 
tion used, but it is not intrinsic of signal types use) 

• we need to keep an eye on Tabjl] while we read the paper (also if we have 
used a nemonic notation). 
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Table 5: Comparative evaluation of number of flops required for CD FT calcu- 
lation for split-radix 3add/3mul algorithm (SR_3/3), classical QFT algorithm 
(clas.QFT), improved QFT algorithm (new.QFT) 



N SR.3/3 clas.QFT new. QFT 



4 


1G 


16 


16 


8 


56 


56 


56 


16 


168 


182 


168 


32 


456 


506 


456 


64 


1160 


1298 


1160 


128 


2824 


3170 


2824 


256 


6664 


7490 


6664 


512 


15368 


17282 


15368 


1024 


34824 


39170 


34824 


2048 


77832 


87554 


77832 



A.l Advantages inherent implementation of the theoreti- 
cal algorithm in a programming language 

Implementative aspects are usually neglected in the theoretical description of 
theoretical FFT algorithms, in literature. The identification of signal types 
(referred to with the notation) already in the theoretical description of the 
algorithm, has the significant advantage to make the theoretical algorithm 'ready 
to be implemented' for any reader who wants to implement the algorithm in the 
programming language he prefers. This characteristic minimizes the time, and 
hence the cost, of implementation (coding) of the algorithm in any programming 
language that requires memory management to allocate the signals created by 
the algorithm into memory cells. In fact, all we need to manage the used signals 
(except mathematical details) is written in Tab{l] [6] In general, this advantage 
is quantitatively more important if we implement many algorithms, or a few 
algorithms with many functions, or algorithms with many different signals but 
with a few signal types. In particular, this advantage is realized in these aspects 
(during the writing phase of the code that implements the theoretical algorithm) : 

1. we do not need to compute the number of array cells that serve to hold each 
stored signal, every time we handle a new signal, because it has already 
been made in Tabjl] provided in the theoretical description of algorithm. 
This number of array cells is max(ln(s),lk(s))). 

2. we just need to determine the matches between theoretical signal (tem- 
poral or frequency-domain) components and array of memory cells which 
contain residual temporal components or required frequency-domain com- 
ponents, inside the area of memory reserved for the signal, only for each 
used signal type, instead of as many times, as a signal type appears in the 
algorithm. In fact the found matches are reusable each time an already 
previously used signal type appears. Tab{6] (which we obtain from Tab{l]) 
shows these matches for each signal type (these matches are valid only 
if we store indeces in growing order, for the programming languages, as 
Scilab-Malab, where the first cell of an array is p = 1, where each cell of 
this array can contain a real or a complex value). 

3. the code is much more readable, since the array-signal terms let the reader 
of code to immediately identify all relevant characteristics of the signal 
stored in each array. 
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4. the debugging time of code which implements the algorithm decreases, 
thanks to the use of Tab[l] [6] . 

5. the time we spend to write comments on code decreases very much. In 
fact: 

• we can write comments in dst_oo (dst_ot) function just copying them 
from the ones already written in dct-oo (dct.ot) function, just sub- 
stituting the dc subscript with ds, thanks to the parallelism between 
these two functions, and thanks to the used notation for signal terms. 

• we can write comments in dst-ot function just copying them from the 
ones already written in dct-cla function, just substituting the stcun 
subscript, from 't' to l o\ thanks to notation used for signal names, 
since these two functions apply the same chain of elaborations (only 
involved signal types change). 



A. 2 Advantages inherent the ideation and development of 
the recursive version of improved QFT 

The determination of the tipology of each created signal, has facilitated the 
development of the new improved QFT algorithm, in the theoretical stage. In 
fact: 

• we immediatly know if a signal created in a function has the same char- 
acteristics of another signal already created before in another function 
(and therefore we can already know how to efficiently manage it, using 
the procedure already used in other functions). 

• knowing the type of any created child signal lets us to control if every 
recursive function maintains the total output In (Ik) parameters of the 
descendent signals (obtained summing the In (Ik) parameters of the two 
signals) identical to the ones of the single input signal of the function. 
This aspect is important, since the lack of this feature for dct_ot function 



makes the classical QFT nor efficient, nor in-place (as shown in sect. 4.9 1 



A. 3 Advantages inherent the theoretical description of 
the algorithm in a paper 

This category includes these benefits: 

1. high intelligibility of the description of algorithms, since: 

• each signal name (sequence of symbols) has the same meaning every- 
where in this paper. 

• the reader knows any detail about the way the algorithm acts, in 
every step of the algorithm. In fact, in any mathematical relation- 
ship described, the reader can obtain any relevant information about 
signals involved (for example in eq.(19l), from their names, using 
Tabffl 

• used notation make the characteristics of any signal type, mnemonic. 
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• signal types explicit the subtleties (the differences in the sto_n and 
stoJz groupings of signals created by the two algorithms shown in 
this paper), that cause the greater efficiency (both as a computational 
cost, and as a memory requirement) of the improved QFT, compared 



to the classical version of the QFT (details are explained in sect. 6.1| 



2. Compactness in exposition of all details of any used elaboration. In fact: 

• The use of signal types lets to detect identical functions in distinct 
algorithms, thus to avoid to describe them several times in the pa- 
per. Understanding if two functions are identical is not trivial. In 
fact using the same chain of elaborations is not enough to make two 
functions identical: they have to apply to the same signal type too. 
For example, dct-cla and dct.ot functions described in this paper use 
the same chain of elaborations, but they are different functions, since 
they apply to different input signals (sd c _tt and Sds.ot respectively). 

• Understanding the used mathematical relationships (for example eq. 



(19)) docs not require additional comments written in natural lan- 
guage, if any signal is described by the notation (except for indication 
of the periodization of any signal present). In fact, each additional 
line of text concerning the description of the signals involved is su- 
perfluous. For example, we don't need to comment: 'for this signal 
we are interested in calculating frequency-domain components only 
for even harmonics k £ {0, 2, . . . , through DCT', because we 

can directly deduce it from Tab[T] 

the particular notation used to describe signal types lets us to de- 
scribe a function just stating it is analogous to another function, and 
describing how signal types change, without loosing any detail. For 
example, in sect. |5.2.4 we can describe the dst-ot and dst_oo func- 



tions, using only 4 lines of text, stating these functions use the same 
chain of elaborations used in dct-ot and dct.oo functions respectively, 
but applied to different root signal types, and thus we just need to 
replace the symbol dc with the symbol ds in any created signal type. 
It is important to note that the statement 'we just need to replace 
everywhere the symbol 'dc' with symbol 'ds' ' gives us many more in- 
formations than the sentence 'just replace the DCT calculation with 
the DST calculation'. In fact the last sentence doesn't inform us if 
this analogia keeps inalterated, or changes, the stojn or sto-k groups, 
of signals involved in the two functions. 

The possibility of using a graphical and intuitive description of the al- 
gorithm: the tree decomposition (see Fig{2] and [3| , where we report the 
concatenation of the processing performed, and the signal types to which 
they apply. Let us consider that the informations conveyed from the de- 
composition tree would be much lower if we insert signals with random 
names in this graph. 

The decomposition tree is useful because it gives us the view of how to 
manage the memory area reserved for data (dividing it among the various 
signals, in each level of decomposition), if combined with Tabfl] In fact, 
if we want, we can sort the signals in the memory area, in the same order 
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Table 6: a possible matching between (theoretical type of signal) and (array of 
memory cells), in an implementation where each signal is stored into a contigu- 
ous sequence of cell (array), where indeces n as k are stored in growing order, 



ind the the first cell of an array has index p 


= 1 




signal type 


temporal signal-array matching 


transformed si 


gnal-array matching 


Scx-tt 


( \ ( \ ~\\ 

Scxj,t\P>) — S cx _tt_arr\ n < 1 ) 


Q (h\ — 

C xM\ K ) — 


Q (T. _i_ -1 \ 

JcxAt_arr(K + >-) 


Sre-tt 


Sre.tt(n) = Sre-tt.arr(n + 1) 


i~>reM\ K ) — 


dreM-arr\ K T J- ) 


Sdc-tt 


Sdc-tt(n) = Sdc-tt-arr{n + 1) 


ddc_tt\K) — 


•Jdc-tt-arr V ft + 1 ,1 




Sdr Pf(n) = SWr of nr-rC ^) 
at — cz \ / Cic_cE_ctf 1 \ 9 / 


<->dc_eH'V — 


•Jdc-et-arr y"' i 1 ) 




s dc_ot( n ) = s dc_at_arr(^-) 


Sdc_ot{k) = 


S d c-ot-arr(k + 1) 
_ o /fc+2\ 
- *-'(/cJe_([rr\ 2 / 


Sdc-te 


SdcJ,e{n) = SdcJ.e-a.rrin + 1) 


SdcJe(k) = 


$dc_to 


Sdc_to(n) = s dc _ to _ arr (n + 1) 


S dc _to{k) = 


- Sdc_to_arr ( — 2 ) 


Sdc.oe 


Sdc-Oe(jl) Sdc-oe_arr( 2 ) 


Sdc_oe(k) = 


- S dcoearr ( — 2 ) 


Sdc-oo 


Sdc_oo(n) = s dc _ 00 _ arr {^) 


&dc-Oo(k) = 


= ^dc-oo-arr ( 2 ) 


Sdc_tie 


s d c_t ie (n) = s dc _ tie _ arr (n + 1) 


S d cMe{k) = 


- S dc _t 1 e_arr{~2 ) 


SdcMt 


Sd c _t lt (n) = s dc _ tlt _ arr (n + 1) 


SdcMt{k) = 


Sdc_tit_arr{k + 1) 


SdsM 


8ds-tt(n) = SdsM-arr(n) 


Sds.tt(k) 


= Sds-tt-arr{k) 


Sds.et 


Sds-et(jl) ^ds-et-arri. ~o) 


Sds-et(k) 


Sds_et-arr(k*) 


Sds-Ot 


s ds_ot{ n ) — Sds-ot-arri^}-) 


S ds _ot{k) 


— Sds_ot_arr{k) 


$ds_te 


Sds_te{n) = S ds _te_arr(n) 


S ds _te(k) 


Sds_te_arr{ 2 ) 


Sds_to 


Sds_to(n) = S ds _to_arr(n) 


Sds_to{k) = 


= Sds_to_arr{ ) 


Sds-oe 


Sds-oeijl) Sds_oe_arr{ 2 ) 


Sds_oe{k) 


5*ds_oe_arr( 2 ) 


Sds_oo 


^ds-Oo(j^} $ds_oo_arr{, 2 ) 


Sds_oo{k) = 


- <?_, 

- ^ds_oo_arr \ 2 / 


Sds_t\o 


SdsMo{n) = S d s_t 1 o_arr{n) 


Sds_t l0 (k) = 


- /' fc + lN l 

- ^as_fiO_arr\ 2 / 


Sds_e 1 o 


Sds-eioip* = "4") = Sds-e\OJirr{X) 


Sds_e 1 u(k = 


1) = Sds_eiO_arr(^-) 



they occur in the decomposition tree. Thus we know where each signal is 
located in memory. 
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Figure 2: The decomposition tree of the classical QFT 
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